Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
1.
J Am Med Inform Assoc ; 2022 Oct 10.
Article in English | MEDLINE | ID: covidwho-2325500

ABSTRACT

OBJECTIVE: Federated learning (FL) allows multiple distributed data holders to collaboratively learn a shared model without data sharing. However, individual health system data are heterogeneous. "Personalized" FL variations have been developed to counter data heterogeneity, but few have been evaluated using real-world healthcare data. The purpose of this study is to investigate the performance of a single-site versus a 3-client federated model using a previously described COVID-19 diagnostic model. Additionally, to investigate the effect of system heterogeneity, we evaluate the performance of 4 FL variations. MATERIALS AND METHODS: We leverage a FL healthcare collaborative including data from 5 international healthcare systems (US and Europe) encompassing 42 hospitals. We implemented a COVID-19 computer vision diagnosis system using the FedAvg algorithm implemented on Clara Train SDK 4.0. To study the effect of data heterogeneity, training data was pooled from 3 systems locally and federation was simulated. We compared a centralized/pooled model, versus FedAvg, and 3 personalized FL variations (FedProx, FedBN, FedAMP). RESULTS: We observed comparable model performance with respect to internal validation (local model: AUROC 0.94 vs FedAvg: 0.95, p = 0.5) and improved model generalizability with the FedAvg model (p < 0.05). When investigating the effects of model heterogeneity, we observed poor performance with FedAvg on internal validation as compared to personalized FL algorithms. FedAvg did have improved generalizability compared to personalized FL algorithms. On average, FedBN had the best rank performance on internal and external validation. CONCLUSION: FedAvg can significantly improve the generalization of the model compared to other personalization FL algorithms; however, at the cost of poor internal validity. Personalized FL may offer an opportunity to develop both internal and externally validated algorithms.

2.
Radiol Artif Intell ; 4(4): e210217, 2022 Jul.
Article in English | MEDLINE | ID: covidwho-1968372

ABSTRACT

Purpose: To conduct a prospective observational study across 12 U.S. hospitals to evaluate real-time performance of an interpretable artificial intelligence (AI) model to detect COVID-19 on chest radiographs. Materials and Methods: A total of 95 363 chest radiographs were included in model training, external validation, and real-time validation. The model was deployed as a clinical decision support system, and performance was prospectively evaluated. There were 5335 total real-time predictions and a COVID-19 prevalence of 4.8% (258 of 5335). Model performance was assessed with use of receiver operating characteristic analysis, precision-recall curves, and F1 score. Logistic regression was used to evaluate the association of race and sex with AI model diagnostic accuracy. To compare model accuracy with the performance of board-certified radiologists, a third dataset of 1638 images was read independently by two radiologists. Results: Participants positive for COVID-19 had higher COVID-19 diagnostic scores than participants negative for COVID-19 (median, 0.1 [IQR, 0.0-0.8] vs 0.0 [IQR, 0.0-0.1], respectively; P < .001). Real-time model performance was unchanged over 19 weeks of implementation (area under the receiver operating characteristic curve, 0.70; 95% CI: 0.66, 0.73). Model sensitivity was higher in men than women (P = .01), whereas model specificity was higher in women (P = .001). Sensitivity was higher for Asian (P = .002) and Black (P = .046) participants compared with White participants. The COVID-19 AI diagnostic system had worse accuracy (63.5% correct) compared with radiologist predictions (radiologist 1 = 67.8% correct, radiologist 2 = 68.6% correct; McNemar P < .001 for both). Conclusion: AI-based tools have not yet reached full diagnostic potential for COVID-19 and underperform compared with radiologist prediction.Keywords: Diagnosis, Classification, Application Domain, Infection, Lung Supplemental material is available for this article.. © RSNA, 2022.

3.
J Am Coll Radiol ; 19(1 Pt B): 184-191, 2022 01.
Article in English | MEDLINE | ID: covidwho-1627037

ABSTRACT

PURPOSE: The aim of this study was to assess racial/ethnic and socioeconomic disparities in the difference between atherosclerotic vascular disease prevalence measured by a multitask convolutional neural network (CNN) deep learning model using frontal chest radiographs (CXRs) and the prevalence reflected by administrative hierarchical condition category codes in two cohorts of patients with coronavirus disease 2019 (COVID-19). METHODS: A CNN model, previously published, was trained to predict atherosclerotic disease from ambulatory frontal CXRs. The model was then validated on two cohorts of patients with COVID-19: 814 ambulatory patients from a suburban location (presenting from March 14, 2020, to October 24, 2020, the internal ambulatory cohort) and 485 hospitalized patients from an inner-city location (hospitalized from March 14, 2020, to August 12, 2020, the external hospitalized cohort). The CNN model predictions were validated against electronic health record administrative codes in both cohorts and assessed using the area under the receiver operating characteristic curve (AUC). The CXRs from the ambulatory cohort were also reviewed by two board-certified radiologists and compared with the CNN-predicted values for the same cohort to produce a receiver operating characteristic curve and the AUC. The atherosclerosis diagnosis discrepancy, Δvasc, referring to the difference between the predicted value and presence or absence of the vascular disease HCC categorical code, was calculated. Linear regression was performed to determine the association of Δvasc with the covariates of age, sex, race/ethnicity, language preference, and social deprivation index. Logistic regression was used to look for an association between the presence of any hierarchical condition category codes with Δvasc and other covariates. RESULTS: The CNN prediction for vascular disease from frontal CXRs in the ambulatory cohort had an AUC of 0.85 (95% confidence interval, 0.82-0.89) and in the hospitalized cohort had an AUC of 0.69 (95% confidence interval, 0.64-0.75) against the electronic health record data. In the ambulatory cohort, the consensus radiologists' reading had an AUC of 0.89 (95% confidence interval, 0.86-0.92) relative to the CNN. Multivariate linear regression of Δvasc in the ambulatory cohort demonstrated significant negative associations with non-English-language preference (ß = -0.083, P < .05) and Black or Hispanic race/ethnicity (ß = -0.048, P < .05) and positive associations with age (ß = 0.005, P < .001) and sex (ß = 0.044, P < .05). For the hospitalized cohort, age was also significant (ß = 0.003, P < .01), as was social deprivation index (ß = 0.002, P < .05). The Δvasc variable (odds ratio [OR], 0.34), Black or Hispanic race/ethnicity (OR, 1.58), non-English-language preference (OR, 1.74), and site (OR, 0.22) were independent predictors of having one or more hierarchical condition category codes (P < .01 for all) in the combined patient cohort. CONCLUSIONS: A CNN model was predictive of aortic atherosclerosis in two cohorts (one ambulatory and one hospitalized) with COVID-19. The discrepancy between the CNN model and the administrative code, Δvasc, was associated with language preference in the ambulatory cohort; in the hospitalized cohort, this discrepancy was associated with social deprivation index. The absence of administrative code(s) was associated with Δvasc in the combined cohorts, suggesting that Δvasc is an independent predictor of health disparities. This may suggest that biomarkers extracted from routine imaging studies and compared with electronic health record data could play a role in enhancing value-based health care for traditionally underserved or disadvantaged patients for whom barriers to care exist.


Subject(s)
COVID-19 , Carcinoma, Hepatocellular , Deep Learning , Liver Neoplasms , Ethnicity , Humans , Radiography , Retrospective Studies , SARS-CoV-2 , Social Deprivation
4.
J Biomed Inform ; 123: 103918, 2021 11.
Article in English | MEDLINE | ID: covidwho-1433456

ABSTRACT

OBJECTIVE: With increasing patient complexity whose data are stored in fragmented health information systems, automated and time-efficient ways of gathering important information from the patients' medical history are needed for effective clinical decision making. Using COVID-19 as a case study, we developed a query-bot information retrieval system with user-feedback to allow clinicians to ask natural questions to retrieve data from patient notes. MATERIALS AND METHODS: We applied clinicalBERT, a pre-trained contextual language model, to our dataset of patient notes to obtain sentence embeddings, using K-Means to reduce computation time for real-time interaction. Rocchio algorithm was then employed to incorporate user-feedback and improve retrieval performance. RESULTS: In an iterative feedback loop experiment, MAP for final iteration was 0.93/0.94 as compared to initial MAP of 0.66/0.52 for generic and 1./1. compared to 0.79/0.83 for COVID-19 specific queries confirming that contextual model handles the ambiguity in natural language queries and feedback helps to improve retrieval performance. User-in-loop experiment also outperformed the automated pseudo relevance feedback method. Moreover, the null hypothesis which assumes identical precision between initial retrieval and relevance feedback was rejected with high statistical significance (p â‰ª 0.05). Compared to Word2Vec, TF-IDF and bioBERT models, clinicalBERT works optimally considering the balance between response precision and user-feedback. DISCUSSION: Our model works well for generic as well as COVID-19 specific queries. However, some generic queries are not answered as well as others because clustering reduces query performance and vague relations between queries and sentences are considered non-relevant. We also tested our model for queries with the same meaning but different expressions and demonstrated that these query variations yielded similar performance after incorporation of user-feedback. CONCLUSION: In conclusion, we develop an NLP-based query-bot that handles synonyms and natural language ambiguity in order to retrieve relevant information from the patient chart. User-feedback is critical to improve model performance.


Subject(s)
COVID-19 , Algorithms , Feedback , Humans , Information Storage and Retrieval , SARS-CoV-2
5.
NPJ Digit Med ; 4(1): 94, 2021 Jun 03.
Article in English | MEDLINE | ID: covidwho-1260955

ABSTRACT

The strain on healthcare resources brought forth by the recent COVID-19 pandemic has highlighted the need for efficient resource planning and allocation through the prediction of future consumption. Machine learning can predict resource utilization such as the need for hospitalization based on past medical data stored in electronic medical records (EMR). We conducted this study on 3194 patients (46% male with mean age 56.7 (±16.8), 56% African American, 7% Hispanic) flagged as COVID-19 positive cases in 12 centers under Emory Healthcare network from February 2020 to September 2020, to assess whether a COVID-19 positive patient's need for hospitalization can be predicted at the time of RT-PCR test using the EMR data prior to the test. Five main modalities of EMR, i.e., demographics, medication, past medical procedures, comorbidities, and laboratory results, were used as features for predictive modeling, both individually and fused together using late, middle, and early fusion. Models were evaluated in terms of precision, recall, F1-score (within 95% confidence interval). The early fusion model is the most effective predictor with 84% overall F1-score [CI 82.1-86.1]. The predictive performance of the model drops by 6 % when using recent clinical data while omitting the long-term medical history. Feature importance analysis indicates that history of cardiovascular disease, emergency room visits in the past year prior to testing, and demographic factors are predictive of the disease trajectory. We conclude that fusion modeling using medical history and current treatment data can forecast the need for hospitalization for patients infected with COVID-19 at the time of the RT-PCR test.

6.
Lancet Digit Health ; 3(4): e241-e249, 2021 04.
Article in English | MEDLINE | ID: covidwho-1145027

ABSTRACT

BACKGROUND: Despite wide use of severity scoring systems for case-mix determination and benchmarking in the intensive care unit (ICU), the possibility of scoring bias across ethnicities has not been examined. Guidelines on the use of illness severity scores to inform triage decisions for allocation of scarce resources, such as mechanical ventilation, during the current COVID-19 pandemic warrant examination for possible bias in these models. We investigated the performance of the severity scoring systems Acute Physiology and Chronic Health Evaluation IVa (APACHE IVa), Oxford Acute Severity of Illness Score (OASIS), and Sequential Organ Failure Assessment (SOFA) across four ethnicities in two large ICU databases to identify possible ethnicity-based bias. METHODS: Data from the electronic ICU Collaborative Research Database (eICU-CRD) and the Medical Information Mart for Intensive Care III (MIMIC-III) database, built from patient episodes in the USA from 2014-15 and 2001-12, respectively, were analysed for score performance in Asian, Black, Hispanic, and White people after appropriate exclusions. Hospital mortality was the outcome of interest. Discrimination and calibration were determined for all three scoring systems in all four groups, using area under receiver operating characteristic (AUROC) curve for different ethnicities to assess discrimination, and standardised mortality ratio (SMR) or proxy measures to assess calibration. FINDINGS: We analysed 166 751 participants (122 919 eICU-CRD and 43 832 MIMIC-III). Although measurements of discrimination were significantly different among the groups (AUROC ranging from 0·86 to 0·89 [p=0·016] with APACHE IVa and from 0·75 to 0·77 [p=0·85] with OASIS), they did not display any discernible systematic patterns of bias. However, measurements of calibration indicated persistent, and in some cases statistically significant, patterns of difference between Hispanic people (SMR 0·73 with APACHE IVa and 0·64 with OASIS) and Black people (0·67 and 0·68) versus Asian people (0·77 and 0·95) and White people (0·76 and 0·81). Although calibrations were imperfect for all groups, the scores consistently showed a pattern of overpredicting mortality for Black people and Hispanic people. Similar results were seen using SOFA scores across the two databases. INTERPRETATION: The systematic differences in calibration across ethnicities suggest that illness severity scores reflect statistical bias in their predictions of mortality. FUNDING: There was no specific funding for this study.


Subject(s)
Hospital Mortality/ethnology , Intensive Care Units , Racism , Risk Assessment/ethnology , Severity of Illness Index , Adolescent , Adult , Aged , Aged, 80 and over , Ethnicity , Female , Humans , Male , Middle Aged , Organ Dysfunction Scores , Racial Groups , Retrospective Studies , United States/epidemiology , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL